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Quantum risk-sensitive estimation and 

robustness 

Naoki Yamamotot and Luc Bouten* 
Abstract 

This paper studies a quantum risk-sensitive estimation problem and investigates robustness 
properties of the filter. This is a direct extension to the quantum case of analogous classical 
results. All investigations are based on a discrete approximation model of the quantum system 
under consideration. This allows us to study the problem in a simple mathematical setting. 
We close the paper with some examples that demonstrate the robustness of the risk-sensitive 
estimator. 

I. Introduction 

Filtering, which in a broad sense is a method for extracting information from a noisy signal, 
is one of the principal tools in modern engineering science. In particular, when considering 
a partially observed dynamical system, we can construct an optimal filter that computes the 
least square estimate of a state variable of the dynamics. In the linear case, this results in the 
so-called Kalman filter [28]. This dynamical filtering theory was rigorously established using 
the classical Kolmogorov probability theory and its application to the theory of stochastic 
differential equations (e.g. [29]). Moreover, it is well known as the separation theorem [44] 
that the solution of a general optimal control problem for a partially observed system can be 
represented in terms of a corresponding information state of the filter. For this reason, the 
filtering theory is not only important in itself, but also essential in feedback control theory. 

The situation is much the same in quantum mechanics. The Heisenberg uncertainty principle 
shows that any quantum system must possess fundamental uncertainty originating from the 
noncommutativity of its random variables. Therefore, we can never have complete observation 
in the quantum setting, which implies the necessity of filtering in the quantum case. Fortunately, 
there exists a quantum filtering theory as a beautiful parallel to the classical one. The theory 
was pioneered by Belavkin in the remarkable papers [4], [5], [6], and the quantum filtering 
equation or stochastic master equation is now widely used in the physics community [1], [8], 
[16], [21], [31], [38], [40], [47]. Moreover, as in the classical theory, it is possible to show that 
a separation principle holds in the quantum case [10]. 

The filtering for both classical and quantum cases is, as mentioned above, clearly an important 
tool in control theory. However, we have to point out that the optimal filter is in general quite 
fragile to unmodeled uncertainty of the system, and consequently the optimal estimation can 
be largely violated. This fact requires us to develop a theory of robust estimation that allows 
some model uncertainties and guarantees high-quality estimation performance. Guaranteed-cost 
filtering [34], [45] is one such robust estimation method in the classical theory. It guarantees 
that the variance of the estimation error is within a certain bound even when the linear system 
under consideration includes unknown parameters. Moreover, risk-sensitive filtering [13], [15], 
[32], [36] is known as a very efficient robust estimation method, for a wide class of classical 
linear and nonlinear systems [7], [42], [48]. Recently, one of the authors has obtained a quantum 
version of the guaranteed-cost filter mentioned above [46]. In this paper, we develop a quantum 
risk-sensitive estimation theory. 

Let us first briefly introduce the classical theory of risk-sensitive estimation. 
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A. Classical risk-sensitive estimation 

We are given a probability space (CI, J 7 , P) and a signal model of a discrete time system 

xi = a(x t -i) + b(xi-i)wi, yi = c(atj_i) + «/, (1) 

where cc; is the signal state, y; is the output, and Wi,Vi are i.i.d. random Gaussian processes. 
A version of the risk-sensitive estimator of xi is defined as 



l— i 

argminE P [^(z;)], ^(zi) = exp L x VVz, 



8 = 1 



(2) 



where = 1 < i < /} is the cr-algebra generated from the observation and p = 

(Mi>A*2) are the weighting constants called the risk-sensitive parameters. Moreover, we use 
the notation z\ S 3^ to indicate that zi is a bounded 3^ -measurable function. The risk-sensitive 
estimator (jlji can be represented by xf = argmin Zi e y t f\(zu otf), where /i is a certain function 
and Q-iix) is an information state defined by 



E c 



M((x,) 



y, 



= / ((x)a^(x)dx, 



(3) 



for all test functions C( a; )- Here, Q is a probability measure defined by 



A, := 



dP 
dQ 



= JJexp c(xi)y. 



-Axif 



(4) 



Moreover, af(x) satisfies a recursive equation of the form af — f2(o:f_ 1 ,xf_ 1 ,yi). Hence, 
running this equation with the measurement data yi, we can recursively calculate af) 
and obtain the minimizer of this function, i.e., xf. 

Note that xf differs from the standard optimal (or risk-neutral) estimator xi := argmin z e y 
Ep[(ii — zi) 2 ] and is thus not optimal in the sense of the mean square error. However, the risk- 
sensitive estimator certainly has a great advantage over the risk-neutral one when we consider 
an uncertain system. This can be seen as follows. If the true probability measure Ptruo is 
unknown, then we need to use a known nominal measure P nom and design a nominal filter 
based on P nom . However, since Ptruo 7^ Pnom. there is no guarantee that the nominal estimator 
yields a bounded estimation error. The risk-sensitive estimator overcomes this issue. That 



is, the nominal risk-sensitive estimator x 



/j,nom 



(i.e., based on P n0 m) satisfies 



i-i 



E 



p tr u [mi 2^ _ ^ ) +m x i~ x i )J 

< logE Pnom [vI/(x^- m )] + i? c (P truo ||P nom ), 



(5) 



where i? c (Q||P) := Jlog(dQ/dP)dQ is the classical relative entropy of Q and P. Eq. |5]l 
implies that the unknown true estimation error is bounded if -R c (Ptruc||Pnom) is finite. This 
robustness property is derived using the following duality relation (e.g. [17]) of two measures 
P and Q: 



logE P [e*] 



sup 
Q 



E Q (^)-i? c (Q||P) : Q«P 



(6) 



where Q <C P means that Q is absolutely continuous to P. 



B. Organization of the paper 

This paper provides a quantum version of the risk-sensitive estimation method presented 
above and shows its robustness properties against system uncertainty. The systems we consider 
are taken from quantum optics and consist of a quantum system in interaction with the quantized 
electromagnetic field. The field is described by a discretized model [9] that converges to a 
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quantum stochastic dynamics [23] when the discretization step is taken to zero [2], [3], [11], 
[20], [30]. The discretized model has the advantage of being very tractable mathematically. 
The estimator is based on the risk-sensitive information state introduced by James [14], [24] 
in the context of quantum risk-sensitive control. We derive a bound on the estimation error in 
the presence of uncertainty. We illustrate the robustness of the estimator by simulations. 

The paper is organized as follows. In Section II we introduce quantum probability in a 
finite dimensional context and a duality relation that will lead to the robustness property of 
the estimator. Section III is devoted to describe a discrete approximation model of the field. 
Section IV introduces the notion of composition of an operator and an operator valued function. 
In Section V we introduce the risk-sensitive estimator and derive the filter propagating the risk- 
sensitive information state. Section VI introduces a class of uncertain systems and derives a 
bound on the estimation error, showing robustness. In Section VII we present the results from 
simulations. 



II. Quantum probability theory 
A. Quantum probability space 

In quantum mechanics, a random variable is represented by a linear self-adjoint operator 
on a Hilbert space. Due to the noncommutativity of such operators, we need to replace the 
conventional notion of a classical probability space (£l,J-, P) by the notion of a quantum 
probability space defined below. 

Definition 2.1 (^-algebra): Let H be a finite-dimensional complex Hilbert space. A ^-algebra 
A is a set of linear operators H — > H such that /, aA + (3B, AB, A* e A for any A, B G A 
and a, (3 G C. A is called commutative if [A, B] = AB - BA = for any A, B G A. 

Definition 2.2 (State): A state on A is a linear map P : A — > C that is positive P(^4*A) > 
0, VA G A and normalized P(J) = 1. 

Let d be the dimension of H. Let (ei, . . . ,ed) be an orthonormal basis of H. The trace is 
the state defined by Tr(^4) = X)f=i( e «! ^ e ») f° r a ^ A £ A. It is well known that this definition 
does not depend on the basis. 

Definition 2.3 (Quantum probability space): Let A be a ^-algebra of operators on a finite- 
dimensional complex Hilbert space H and P be a state on A. Then, (A, P) is called a (finite- 
dimensional) quantum probability space. 

Let (A, P) be a quantum probability space. A self-adjoint element of A is called a quantum 
random variable or observable. If A is a commutative *-algebra, then we call (A, P) a com- 
mutative quantum probability space. In this case, all quantum random variables in A commute 
with each other, which is the same as in the classical case. It is therefore not surprising that 
a commutative quantum probability space is equivalent to a classical one. A formal statement 
of this assertion is provided by the well known spectral theorem (Theorem 2.1 below). Note 
that in the finite dimensional setting of this article the spectral theorem follows trivially from 
diagonalizing the operators in A (see the proof of Theorem 2.1 below). In an infinite dimensional 
setting an analogous result, which is closely related to Gelfand's Theorem for commutative C*- 
algebras (see e.g. [35]), is true. 

Definition 2.4 ( ^-isomorphism): Let f2 be a set and let J- be a cr-algebra on fl. A ^-isomorphism 
between a commutative *-algebra C and the set of bounded JF-measurable functions i°°(!F) 
on Q, is a linear bijection l : C — ► l°°(!F) such that c(A*)(i) = i(A)(i)* and l(AB)(i) = 
i(A)(i)L(B)(i) for all A, B G C and i G 0. 

Theorem 2.1 (Spectral theorem): Let (C,P) be a finite-dimensional commutative quantum 
probability space. Then there exists a classical probability space (f2, J 7 , P) and a ^-isomorphism 
i : C -> £°°(T) such that ¥(A) = E P [i(A)], VieC. ' 

Proof: The theorem is proved by construction. First, let H = C" and = {1, . . . , n}. Since 
[A, A*] = VA G C, all the elements in C can be diagonalized simultaneously. Hence, we can set 
A = diagjai, . . . , a n } and define a classical random variable l(A) : — > C by l(A)(i) = a,;. 
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Let P be a projection in C, i.e., P = P* = P 2 , then l(P) is the indicator function of a subset 
Sp of SI. We define as the set of subsets Sp of SI where P runs through the projections 
in C. Furthermore, we define a probability measure P on T by P(5p) = P(-P), VP € C. 
As a result, we have constructed a classical probability space (SI, T, P). It is easy to verify 
Ep[ t (A)]=P(A). " ■ 

Note here that any observable A = A* £ A is an element of the commutative *-subalgebra 
C C A generated by A itself. Using the spectral theorem we see that we can always realize an 
observable A as a classical random variable l(A) on a classical probability space (SI, J 7 , P), 
where the measure P is given by the state. If we perform a measurement of A, we obtain one 
of the values that l(A) can take, distributed according to P. Note that if two observables do not 
commute with each other, then we cannot represent them both as classical random variables 
on the same probability space. Such observables are called incompatible, they cannot both be 
measured in a single realization of an experiment. 

Example 2.1 (Quantum two-level system): Let H = C 2 and let M. be the *-algebra of 2 x 2 
complex matrices. Moreover, let if) be a state on A4. With the quantum probability space (A4,ip) 
we can model a two-level system. The state ijj can be written as ip(X) — Tr (pA), VA S M. 
for some operator p that is positive and normalized (i.e., Tr (p) = 1). Let us now consider a 



commutative *-subalgebra V = {D = diagjcii, (I2} \ di, d 2 G R-} C .M. From Theorem 2.1 
we can construct a classical probability space that is in one-to-one correspondence with (23, ip). 
The sample space is SI — {1,2}, and the set of events is T = {0, {1}, {2}, St}. A classical 
random variable l(D) is then defined through l(D)(1) = d\ and l(D)(2) = di- Now, D £ V 
has a spectral decomposition D = J^diPi with the projection matrices Pj = diag{l,0} and 
P2 = diag{0, 1}, which yield classical indicator functions X{i} = <-(Pi) an d X{2} — '-(^2)- 
Hence, the probability distribution of l(D) is given by P({1}) = ip(Pi) = Tr(pPi) = pn 
and P({2}) = p 22 . 

Let (^4i,Pi) and (A2,^2) be two quantum probability spaces, defined on the Hilbert spaces 
Hi and H 2 , respectively. We will now introduce the composite quantum probability space 
(Ai (8>^42,IPi <8>P2)- Let a\ ®a 2 be the tensor (Kronecker) product of two vectors a\ € Hi and 
a,2 G H2. Introducing an inner product (a% (X> a 2 , 61 ® 62) := (ai, &i)(«2, ^2), we have a Hilbert 
space Hi ® H 2 . The composite quantum probability space (A\ ® ^2, Pi ® P2) is then defined 
on Hi ® H2 as follows. First, we define an element A\ % A2 £ A\ ® A2 through the relation 
(A\® A2)(a\® 0,2) = A\d\® A2<i2. Any element of A\®Ai is given as a linear combination 
of such elements. Second, the state Pi<g>P 2 is defined by (Pig^X-Ai®-^) = Pi (^1^2(^2)- 

B. Conditional expectation 

Let (-4, P) be a quantum probability space. Let A and B be two commuting self-adjoint 



elements of A. Using Theorem 2.1 we can present A and B as classical random variables 
l(A) and l(B) on a classical probability space (SI, T, P). This allows us to form the classical 
conditional expectation E[i(A) | t(B)]. The quantum conditional expectation V(A\B) can then 
be defined as its pull-back 

P(A\B) = r 1 (Ep[i(A)\i(B)} 

Now suppose that instead of the operator B, we want to condition A on a commutative *- 
subalgebra C of A. As long as A commutes with every element in C, we can apply the spectral 
theorem to the commutative *-algebra generated by C and A together, and define 

P(A|C) = i - 1 (Ep[ t (A)|a( t (C))]), (7) 

where <j(l(C)) stands for the classical cr-algebra generated by i(K), K £ C. This shows that 
given a commutative *-subalgebra C, we can define the quantum conditional expectation onto 
C for every self-adjoint element in the commutant of C. Here the commutant of C is given by 

C':={AeA \ [A,C] = VC e C}. 
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The formal definition of the conditional expectation follows below. It coincides with the standard 
definition of the conditional expectation for operator algebras [43], [41] for the situation we 
are interested in. Note, however, that our definition is more restrictive since we only allow for 
conditional expectations from the commutant of a commutative algebra C onto C itself. 

Definition 2.5 (Quantum conditional expectation): Let (A, P) be a quantum probability space, 
and let C be a commutative *-subalgebra of A. Then the map P( • \C) : C — > C is called (a 
version of) the quantum conditional expectation from C to C if ¥(¥(A\C)K) = ¥(AK) \JA e 
C, VK e C. 

Note that for every self-adjoint element A 6 C, we have given an explicit expression for the 
quantum conditional expectation in Eq. dTll. Every element A in the commutant can be written 
in a unique way as A = A\ + 1A2 with A\ and A2 self-adjoint. If we define the conditional 
expectation of A onto C by linear extension of the definition in Eq. (JT), then it is easy to 



see that it satisfies the formal definition given in Definition 2.5 This means we have shown 
existence of the quantum conditional expectation as defined in Definition |2.5| 

Finally, we remark some basic properties that both the classical and quantum conditional 
expectations satisfy: (i) P(A|C) is unique with probability one, (ii) ¥(¥(A\C)) = ¥(A), (iii) 
¥{CA\C) = C¥(A\C) if C € C and A € C (module property), and (iv) ¥(¥(A\B)\C) = 
¥(A\C) if C C B (tower property). Note that it easily follows from the tower property that 
P( • \C) is idempotent, i.e. it is a projection. Moreover, similar to the classical case, P(A|C) is 
the least mean square estimate of A given C, i.e., 

||A-P(A|C)|| P <m-P(A|C)||,+ ||P(il|C)-B||p=||A-B||p VBeC, (8) 

where we have defined := ¥(X*X). 

C. Density matrix and quantum relative entropy 



In Example 2.1 we have seen that the state P can be represented in terms of a matrix p. In 



the finite dimensional case we can always find a unique density matrix p that satisfies 

P(A) = Tr ( P A), p = p* > 0, Tr p = 1. (9) 

The latter two conditions guarantee ¥(A*A) > VA 6 A and P(7) = 1, respectively. In 
particular, when p is a rank-one projection matrix p = bb* , b G H, then ¥(A) is expressed as 

¥(A) =Tr(bb*A) = (b,Ab), (10) 

where ( • , • ) denotes the standard Euclidean inner product of two vectors. 

In analogy to the classical relative entropy, which has been introduced in Section I, we can 
define the quantum relative entropy of two states in terms of their density matrices as 

R(pWp') '■= Tr [p(logP - log/?')] if suppp C suppp', (11) 

where suppp represents the linear space spanned by the eigenvectors of p [33]. If suppp <2 
suppp', then R(p\\p') := +00. A quantum version of the duality relation (|6| is given as follows. 

Lemma 2.1 (Duality, see e.g. [33] Prop. 1.11): For any observable A 6 A and density ma- 
trices p and p', the following relation holds: 

logTr(e A+logp ') = max plr (pA) - R(p\\p') : suppp C suppp'j . (12) 

Proof: The proof is straightforward. Defining a density matrix p a — e A+1 ° 6 9 /Tr [e" 4 " 1 "' 08 9 ], 
we obtain 

Tr (pA) - R(p\\p') = logTr (e^"') - R(p\\ Po ). 

Then, as the quantum relative entropy R(p\\p Q ) is always non-negative and takes zero only 
when p = p , we observe that Eq. ( ff2| holds and the maximum is attained only when p = p Q . 
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We can derive a relaxed form of Eq. ( fl2| >, expressed in terms of the corresponding states. 
From the Golden-Thompson inequality Tr(e A+B ) < Tr (e A e B ) (see [19], [39]) with A,B 
self-adjoint, we have 

logTr (e A+logp ') < logTr (e A e log "') = logTr (e A p'). 

Therefore, denoting the states corresponding to p and p' by P and P' respectively, we have 

F(A) <logP'(e A ) + R(p\\p'). (13) 

This inequality will be used to show robustness properties of the quantum risk-sensitive esti- 
mator. 



III. The discrete field and quantum filtering 

In this paper we restrict ourselves to a system that consists of a two-level atom in interaction 
with the quantized electromagnetic field. This is merely for reasons of convenience, the theory 
can easily be extended to a large class of systems in interaction with the field. In this Section 
we first introduce a discrete model for the electromagnetic field (see [9] and the references 
therein). Second, we describe the interaction between the atomic system and the field. Due to 
the interaction, the field carries off information about the system. In this way, by measuring the 
field, we can perform a noisy observation of the system. Finally, using quantum filtering theory 
we form optimal estimates of the atom observables. The quantum filtering equation recursively 
propagates these estimates in time. 



A. The discrete field 

We first describe the quantum probability space with which we model the electromagnetic 
field in a discrete manner. Imagine a one-dimensional field traveling towards a photo detector. 
We divide the field into N time slices of length A 2 . The total measurement time is T = NX 2 . 
If N is large enough, the photo detector detects either zero or one photon in each time interval. 
Therefore, if N is large, each slice of the field can in good approximation be regarded as a two- 
level system (M., (f>), see Example 2.1 The vacuum state <fi on M. is given by <j>(X) — ($, XQ) 
where <& = [0 1] T denotes the so-called vacuum vector. The field can then be constructed 
as the TV -fold tensor product of two-level systems representing the different time slices, i.e., 
(WN,<fi® N ) = (M® N , <j) 0N ). In particular, we assume that the system that interacts with the 



field is a two-level atomic system (A4,ip), 
and field together is given by 



i.e., the total quantum probability space for system 



(M ® W N , P) = (M ® M® N , ip <g> <p® N ). 



(14) 



Let p be the density matrix corresponding to t/j. Then, P(X) can be written as ¥(X) = 
Tr [X{p ® (§§*)® N )\ for all X e M W N - 

Next, we introduce discrete noises. To this end, we define 

Xi := I® 1 - 1 ' 1 *) ®X® I®( N ~ l ) e W N , l = l,...,N, 

where X is a 2 x 2 matrix and / is the 2x2 identity matrix. Using the above notation, let us 
define the following noise matrices: 

AA(i) = A«h, AA(l)* = \(cr+)i, AA(/) = {a+a-) t , At(l) = A 2 /;, (15) 



where 





1 



a _i_ = er_ = 



1 




(16) 
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Furthermore, we define the following so-called fundamental noises living in the first I slices: 



A(Q = X>A(i), A(0* = X)AA(t)*, A(0 = £aA(»), 4(1) =£>(*), 



! = 1 



i=l 

= 0. 



i=l 



with the convention A(0) = A(0)* = A(0) = 4(0) = 0. We now provide the following 
physical interpretation to the fundamental noises. First, 4; := t(t(l)) always takes the value 
/A 2 = (l/N)T, and thus, we may regard t(l) as the time. Second, since AA; := t(AA(/)) takes 
either the value or the value 1 at time I, it is reasonable to interpret A(7) as the total number of 
photons counted by the photo detector up to time I. For the vacuum state, we have Prob(AA; = 
1) = </>® (diag{l, 0};) = 0, which implies that the photo detector detects no photons. Finally, 
with regard to A(l) and A(l)* , we introduce an observable AW (I) := AA(l) + AA(l)* and a 
commutative *-algebra generated from AW(i), (0 < i < I): 



Ci := alg{AW(i) = AA(i) + AA(i)* | < i < I}. 



(17) 



AW (I) has the following spectral decomposition: 



AW (I) = AA(l) + AA(iy 
with the projection matrices 



A 
A 



AP+ + (-A)Pf, 



1 

-1 



(18) 



Thus, for the vacuum state, the classical random variable Aw; := l(AW(1)) takes +A with prob- 
ability Prob(+A) = cj)® N (P+) = (<S>® N , P+<S>® N ) = 1/2 or -A with probability Prob(-A) = 
1/2 at each time. This implies that {i»i}i=i,...,iv is a symmetric random walk. If we let N go 
to infinity and A to 0, but keep the product T — NX 2 constant, then it easily follows from 
Donsker's invariance principle (see e.g. [26]) that wi converges weakly to a classical Brownian 
motion. Note that the relation AW(l) 2 = At (I) becomes dwj = dt in the limit (see e.g. [37]). 
In physics the observable A(l) + A(l)* is known as afield quadrature, see e.g. [12], [18]. 



B. System-field interaction 

Let Hi and H2 be Hilbert spaces with which we describe two separate quantum systems. 
The total interaction between these two systems over the first I time units can be described by 
a unitary transformation U(l) that acts on the composite space Hi (8) H2. The time evolution of 
an observable X of the composite system is given by the flow 

j l (X) = U(l)*XU(l). 

Suppose we start with an observable X that acts non-trivially only on the first system. At time 
/ this observable is given by ji(X) = U (I)* (X ® I)U (I) which in general will act non-trivially 
on both components in the tensor product Hi ® H 2 . This shows that information has been 
carried from the system that lives on Hi to the system on H 2 . Note that a unitary U can always 
be represented as U = e~ lH for some self-adjoint matrix H called the Hamiltonian. 

In our model, a two-level atomic system repeatedly interacts with the slices of the field. Let 
H lnt (I) G M. ® Wi be the self-adjoint operator given by 

# int (0 - ii-x(ii)® AA(0+ii-i(i a )®AA(0*+ii- 1 (i5)®AA(0+jj- 1 (i3)® At(i), (19) 



where the Lj's are elements in M. (for i = 1,2,3) such that L\ and L 3 are self-adjoint. These 
system operators determine which kind of interaction between the two-level system and the field 
we are considering, i.e. they determine the physics of our problem. See Section VII for two 
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examples: a dispersive interaction and spontaneous decay. We let H mt (l) be the Hamiltonian 
for the interaction between the system and the Z-th field slice, that is, 

l 

U(l) = He^^ = c -^ mt « • • . e - iffmt W [7(0) = I. (20) 

i=l 

We define another unitary operator Mi by 

^ . = e -i{L 1 ®AA(l)+L2®AA*(l)+L*®AA(l)+L 3 ®At(l)} 

Since e ~ iffl " t(/) = U(l - l)*Mj{7(/ - 1), the unitary operator E7(Z) satisfies 

17(f) = U(l - l) e - iffm,(i) = MiU(l - 1). (22) 

The operator Mi acts non-identically only on the system and the l-th slice of the field. Thus, 
Mi can be expressed as 

Mi :=I+ M ± ® AA(/) + M+ AA(Z)* + M~ <8 AA(Z) + M° ® At(Z), (23) 

for some system operators M 8 € M. (i = ±, +, — , o), which are uniquely determined by 
Li (i = 1,2,3). Note that the unitarity of Mi implies certain relations between the operators 
M\ e.g., M° + M°* + M+*M+ + \ 2 M°*M° = 0. From now on, we will use Mi and M i 
instead of H lut (l) and L t to describe the interaction. We can write the following difference 
equation for the unitary U(l) 

AU(l) = U(l)-U(l-1) = [M ± AA(l) + M+AA(l)*+M-AA(l) + M°At(l)]U(l-l). (24) 

For simplicity we have omitted the tensor product ® between AP and the noise operators. This 



rule will be applied throughout this paper. The dynamics (24i is called the quantum stochastic 
difference equation. It is a discrete version of the Hudson-Parthasarathy equation [23]. 

Next we describe a measurement performed on the field. Let us again consider the field 
observables W(l) = A(l) + A(l)*, (I = 0, . . . , N). After the interaction, these observables are 
given by 

Y{l):=j l (W(l)) = U(l)*[A{l) + A{l)*]U{l), < I < N. (25) 

The observation process Y/, (Z = 0, . . . , N) satisfies the following difference equation 

AY(l) = 17(f)* i AA (l) + AA(Z)*] U(l) = ]i{AW{l)). (26) 

Here we have used Eq. |22j and [M h A{l - 1)] = 0. Moreover, using \M k ,AW(l)} = (k > 
l+l) we find that 

AY(l) = U(k)*AW(l)U(k) = j k (AW(l)), (27) 
for all k > I. Therefore we find 

[AY(i), AY (J)] — 0, Vi,j. (28) 
This means that the algebra generated by the observations 

y t = alg{AF( l ) \ 0<i<l}, (29) 

is a commutative ^-algebra for all < I < N. This is called the self-nondemolition property 
of the observations Yi. Due to this property, we can define the classical process Ayi = 
l(AY(1)), (I = 0,...,N). This classical process represents the data that we obtain while 
running the measurement. Note that AY(l) has the following spectral decomposition 

AY(i) = u(iyw(i)u(i) = xu(iyp+u{i) + {-\)u{iypru(i), (30) 

where the projection matrices P ; + and Pf are given by Eq. (IB) , Hence, from Theorem 



2.1 



the classical random variable Ayi = i(AY(Z)) takes +A with probability Prob(+A) = tp (g) 
<j»® N (U(l)*PfU(l)), which now depends on the interaction, or —A with probability Prob(— A) = 
1 - Prob(+A). 
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C. Quantum filtering 

The purpose of quantum filtering is to calculate the least mean square estimate of the 
observable ji(X) = U(l)*XU(l) for a given system observable X E M.. More specifically, 
we aim to find an element in the commutative ^-algebra 3^ that minimizes the mean square 
error, i.e., Z° pt = argmin ZjS;Vi ¥(\ji(X) - Zi\ 2 ), where \A\ 2 := A* A for an operator A on a 
Hilbert space. Note that Eq. d27) leads to the following nondemolition property 



\ji(X), AY(k)} = 0, VN > I > k > 0, (31) 
which implies that ji(X) E y' t , VZ. Due to the self-nondemolition and nondemolition properties 



the quantum conditional expectation P(ji(X) \ yj) exists. Moreover, in Subsection II-B we have 
seen that the quantum conditional expectation is an optimal estimator. Therefore the optimal 
estimator Z° pt is given by Z° pt = P(ji(X) \y{). 

Since ¥(ji(X) | y>i) is linear, positive with respect to X, and normalized, i.e. P(j;(7) | J 7 ;) = 1, 
we can define an information state on the two-level atomic system by 

n (X)=P(j l (X)\y l ), XeM. (32) 

Note that the state m on M. is stochastic, it depends on the observations up to time I. We are 
now going to derive a difference equation for ir[(X), i.e., the quantum filter. The following 
noncommutative Bayes formula [10] is useful to derive the filter 

MX) = P(MX) \y t ) = mrnviirvimm) ■ (33) 

Here, C; is the commutative *-algebra defined in Eq. ( fT7| i and V(l) is the unique solution to 
the following difference equation: 

AV(l) = [M + AW(l) + M°At(l)) V(l - 1), V(0) = I. (34) 



We note that the conditional expectation in Eq. < |33j > is well defined, because V(l) is driven by 
At(l) and AW(l) and thus commutes with any element of C;. This means that V(l)*XV(l) 
is an element of C[ for all system observables X E M.. We now introduce an unnormalized 
information state ct ; by o^X) := U(l)*¥(V(l)*XV(l) \C t )U(l) for all X E M. Eq. ([33) now 
reads iri(X) = ai(X)/ai(I), which is a noncommutative analogue of the classical Kallianpur- 



Striebel formula [?]. It easily follows from Eqs. (24i and (34i that o~i(X) satisfies the following 
difference equation 

Aai{X) = cr l _ 1 (C{X))At{l)+o- l - 1 {J{X))AY{l), a Q = i>, (35) 

where the operators C and J are given by 

C(X) := M+*XM+ + \ 2 M°*XM° + XM° + M°*X, 

J(X) := \ 2 M + *XM° + \ 2 M°*XM+ + XM+ + M+*X. (36) 

The filter can now be obtained immediately from tti(X) — o~i(X) We, however, will 
always use the unnormalized version of the filter given in Eq. ( [35) . 

Note that ai(X), At(l), and AY(l) are all elements in the commutative ^-algebra J 7 ;- Due 
to Theorem |2 . 1 1 they can be diagonalized simultaneously, which yields the following classical 
random variables i(ai{X)), Ati = i(At(l)) = A 2 , and Ayi — l(AY(1)). Moreover, since 
i(ai(X)) is a linear and positive functional of X, we can define a 2 x 2 positive semidefinite 
matrix Q[ that satisfies i(ai(X)) = Tr (qiX). The unnormalized density matrix qi is called the 
unnormalized information density matrix. It is easy to derive a difference equation for qi\ 

AQ l =jC(Qi^- L )At l + J(gi^ 1 )Ayi, g = p, (37) 

where the operators £ and J are given by 
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£(g) := M+gM+* + X 2 M°gM°* + M°g + gM°* , (38) 
J(g) := X 2 M°gM + * + X 2 M+gM°* + M+g + gM+* . (39) 

IV. Composition of an operator- valued function and an observable 

In the following section we will formulate risk sensitive estimation as an optimal control 
problem for a given cost function, see Eqs. fOj ) and ( |44| ) below. The idea of risk-sensitive 
control is to absorb the running cost of the cost function into the dynamics, see Eq. ( |48| l below. 
This means that the new dynamics depends on past estimates (the controls in the optimal 
control formulation of risk-sensitive estimation) which are a function of the observations thus 
far. Therefore we need to make precise mathematically what we mean by operator coefficients 
(see for example the coefficients in Eq. ( |49] > below) that depend on a function of the observations 
thus far. We address this topic in this section. 

Let Ai be a finite dimensional ^-algebra and let A2 be a commutative finite dimensional 
*-algebra. Let K be an yli-valued function on C, i.e., K : C 3 u — ► K(u) G A\. Let u be an 
element in A2- Note that since A2 is commutative, we have u*u = uu*, i.e., u is normal. The 
spectral decomposition of u can be written as u = J^xespfu) X -Pu( x ), where sp(u) denotes the 
spectrum of u, i.e., the set of eigenvalues of u. The composition K(u) G Ai ® A2 of K with 
u is defined as [9] 

K(u):= K{x)P u {x)&A 1 ®A 2 . (40) 

a;Ssp(u) 

This is a natural generalization of the composition of K with a classical random variable 
a : (ft, T P) — > C, given by 

K(a)(u):= K ( x )X{ a =x}(u) G At- (41) 

a;Gran(a) 

Here ran(a) denotes the range of a and X{a=x} is me indicator function of the set {ui G 

fl I a(u>) = x}. 

Let u; be an element of the observation algebra 3^z, defined in Eq. ( f29] >. This means that we 
can write u/ as a function of AY(i), 1 < i < I: 

u I = /,(Ay(i),...,Ay(0)ey, ) 

for some function /; : R l — > C. Moreover, we can also write u; in terms of the observables 

AW(i) = AA(i) + AA(i)*, 1 < i < I as 

u, = fi(ji{AW(l)), . . .,ji(AW(l))) = i,(/,(Aff(l), . . . , AW(l))), 
where we have used Eq. ( p7] i. Therefore, if we define an element it; in Ci by 

u; :=ft(AW(l),...,AW(l))eCi, 

then U; can be written as U( = ji(iit). An Al-valued function K : C 3 u ^ K(u) G M. and an 
element u; in Ci give rise to the composition K(ui), which is an element in Ai <£>Ci. Denoting 
the spectral decomposition of ii/ as ii/ = X^esp(ui) x Puii x )^ we obtain 

ji(K(u l ))= J2 U(iyK(x)U(l)U(iyP il (x)U(l)= ^ jl{K{x))P u{l) ^ im {x) 

xSsp(iii) Kesp(ui) 

= J2 MK(x))P Ul (x) =: MK)^), (42) 

£CGsp(ui) 

where we have introduced the notation ji(K)(ui) in the last step. Note that ji(K)(\Xi) is an 
element in U(l)*(M (g) Ci)U(l). 
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V. Quantum risk-sensitive filtering 

In this section we study a quantum risk-sensitive estimation problem. Let X c be a fixed 
element of the two-level atomic system M.. Then, the risk-sensitive estimator of ji(X e ) is 
defined as follows: 



argmin F(ui, . . . ,u N ), 



(V(l) 

where the cost function F is given by 

F(ui, ...,u N ):= p[R(N)*exp^ 2 \j N (X c ) - u N \ 2 ^)R(N) 
and the matrix R(l) 6 M. ® W/_i is given by 



i-i 



R{1) = ^exp^A 2 ^) - u,| 2 ] , R(l) = R(0) = I 



(43) 



(44) 



(45) 



Note again that \A\ 2 := A* A. Here, /i = (/!i,/i2) are weighting parameters that represent 
risk-sensitivity. Using the .M-valued function 



K 



M : K(u) = \X c -u\ 2 , 



we can write K(u{) — \X C — u;| and ji(K)(\x{) = \ji(X c ) — u/ 1 . Using these compositions, 
we can obtain a recursive form of R(l): 



l-l 



i-i 



i=i 



A 2 K(iii-i)/2 



J?(Z - 1). (46) 



Remark 5.1: If all matrices in Eqs. (j44j» and ( p3] l commute with each other, the quantum 
risk-sensitive estimator reduces to 



xJ i (l),...,X ' l (N))= argmin 

uie^i,...,uj V e3 ; j\ 



JV-l 



expUjA 2 |ji(X c )-Ui| 2 +/i2|jAf(^c)-UAr| 



which is identical to the definition of the (generalized) classical risk-sensitive estimator in 
Eq. |2]). Hence, Eq. ( |43"j ) is a natural noncommutative extension of the classical risk-sensitive 
estimator to the quantum case. 

Remark 5.2: In the limit of /L«i,/i2 ~ > 0, X c (I) coincides with the standard quantum optimal 
estimator tti(X) in Eq. ( |3"2| ). This is easily seen as follows. The estimation error cost function 
in Eq. ( p4] i is expanded to first order in nx and /j, 2 as 

JV-l 

F(ui, . . . , ujv) = 1 + ^\ 2 J2 p (b';PQ - u 4 | 2 ) + v 2 P(\.]n(X c ) - u N \ 2 ) + o(mi,m 2 )- 

i=l 

Thus, in the limit /ii,/i2 — * 0, the minimizers of this function are given by u° pt = ni(X e ), 
(1 = 1,..., N), i.e., we have 

lim X e ' M (l)=Tr l (X e ). (47) 

For this reason, iri(X c ) is called the risk-neutral estimator. 

The remainder of this section is organized as follows. First, we introduce a risk-sensitive 
information density matrix gf, which is the quantum analogue to the classical information 
state af(x) discussed in Section I-A. Second, we derive a recursive equation for gf. As in the 
standard quantum filtering case, gf contains all information needed to calculate the estimator 
( |43~| . More specifically, Eq. ( |44| i can be represented only in terms of gf, see Section V-C. 
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A. Quantum risk-sensitive information state 

We start by defining the following modification of the unitaries given by the difference 
equation p4| : 

U"(l) := U(l)R(l). (48) 

Here R(l) is given by Eq. ( p3j ). Note that R(l) depends on /i = (^1,^2)1 but only through /xi. 
Using Eqs. ( pZ) and (46) , we find the following difference equation for U^(l) 



AU"(l) = U(l)R(l) ~ Vil-l) 

= MiU(l - l)U(l - l)*e^ lX2K '- l/2 U(l - l)R(l - 1) - U»(l - 1) 



M t i 1 \ 2 K l ^ 1 /2 _ j 



^(1-1), U"^(0) = I. 



Here, we have used K1-1 as a short hand for K(ui-\). Using Eq. l|23), this can be rewritten 



as 



AU»(l) 



+ M ± e AllA2 ^- l/2 AA(0 + M+e^^-^A^Z)* + A^re^^-^A^Z) I7"(i - 1). 
Now, let us define V^(l) as the solution to the following difference equation 

A7"(0 = [{M°e^ A2K! - 1 / 2 +^(e^ A2Ki - 1 / 2 -l)}At(0+M + e^ lA2Xi - l/2 A^(0] V»(l-1), 

(49) 

with U M (0) = I. Note that this equation is identical to Eq. p4| ) when /ii = 0. Two crucial 
properties of y M (Z) are given in the following lemma. 

Lemma 5.1: For all 1 < I < N the matrix V^(l) is an element of M. ®C/ C C[. Moreover, 
we have 



u^iyxu^i) 



viiyxv^ii) 



(50) 



for any X in M <g> Wn- 

Proof: To prove the first assertion, we assume that V^(l — 1) G M. ®C;_i. Since y M (Z) 
is calculated recursively, using AW (I), At(l), and U M (Z — 1), all of which are included in 
M. ®Cj, we obtain V M (Z) € .M ® C;. The assertion follows by induction. 

For the second claim, we note that U^^v ® = V*(Z)w ® $ 8jV holds for all vectors 
v e C 2 due to the relations AA(l)<Z>® N = AA(l)<f>® N = and U»(Q) = V^(0) = I. Therefore, 
when the system density matrix is of the form p = vv* , any IsM® Wat satisfies 

which directly implies Eq. ( |50] ) due to Eq. ( fTO) . Since every density matrix p is a convex 
combination of vector states, the lemma is proved. ■ 
Definition 5.1: Since by Lemma 5.1 V^(l) is an element of the commutant of Ci, we can 
define the following unnormalized risk-sensitive information state [14]: 



af(x) := uiiyFivuyxvwidMi) e y,. 



(51) 



Moreover, we define as the unnormalized risk-sensitive information density matrix corre- 
sponding to af by 

L(tf(X))=Tr(Q l fX), VXeM. (52) 

Lemma 5.2: Let ii/ be an element in C;. Let Z : C — > A4 be an .M-valued function. Then 
we have 

i(a?(Z(u l )))=Tr[ Q ?Z(u l )], 
where ui — t(ji(ui)) — i(ui) is a function of Ayt = i(AY(i)), 1 < i < I. 
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Proof: Denote the spectral decomposition of ti; by li; = X^esp(ui) x ^i( x ) G ^l- Then, 
it follows from the definitions ( |40] > and ( f5T| that we have 



a?{z{x k )) = u{iyv\v»{iy( Y, z{x)PuM))v^i)\c]u{i) 

xSsp(iii) 

J2 u(iyp(y»(iyz(x)v»(i) | c*) uuMiyPu^uii) 



xesp(ui) 

= <(Z(x))P u(iriilU{l) (x)= Yl <(Z(x))P Ul (x). 

In the first step we used Pu^x) e Ci and [Pa, (a:), V^(l)] = 0. Note that a^(Z(x)) S ^ and 
P U; (a;) e 3^ can be diagonalized simultaneously by a ^-isomorphism t. Using l(P U[ (x)) = 
X{i(u t )=x} (see Theorem [ZTj, we get 

t(af (Z(u,))) = Yl ^{Z(x)))i(p ni (x))= Y Tr(^Z(x)) X{t (u ! )=,} 

xSsp(u,) a;esp(ui) 



Tr [si Z (x)x{ 



ui=x} 



Tr 



[gfz( Ul ) 



where we have used the definitions (41 1 and (52 1. Since u; S 3^, u; = l(ui) is obviously a 



function of Ayi, . . . , Ay;. This completes the proof. ■ 

B. Dynamics of risk-sensitive information density matrix 

The objective here is to derive a recursive equation for gj\ Let X be an element of M.. A simi- 
lar calculation to Eq. ( |26] i yields the following difference equation for j^(X) := V^(l)*XV^{l) 

Ajf(X)=if_ 1 (^(X,u i _ 1 ))At(Z)+if_ 1 (J' 1 (X,u J _ 1 ))AW(0 ) 

where 



^(X, u) := ^ llX2K{u)/2 \M +*XM+ + \ 2 M°*XM° + XM° + M°*X 

+ J_ ^ 1 A 2 K( u) /2 XeMl A 2 K( tl )/2 _ x j e M) 

J»(X,u) := e ^ x2K{u)/2 \X 2 M + *XM° + X 2 M°*XM + 



XM+ + M + *X 



i p, 1 \ 2 K(u)/2 £ M 



Note that K(u) = \X C — u\ 2 . Since j^(X) is an element of the commutant C[, we can define the 
quantum conditional expectation a^(X) := ¥(j^(X) \Ci). This satisfies the following difference 
equation 

A&f(X) = &f_ 1 (^(X,u i _ 1 ))At(Z)+af_ 1 (^(X,ii / _ 1 ))AW(0. 

Eq. (J5TJ) can now be written as af{X) = U (l)* 5f (X)U (I) . This means we find the following 
difference equation 

Acrf (X) = U(l - l)*Mt&a?{X)MiU{l - 1) 

= U(l - iyaf_ 1 {C> t (X,ii i - 1 ))U{l - l)At(/) 

+ U(l - - 1)17(0*^(0^(0 

= rf_ 1 (C»(X,Z l -.i))At(t) + o?_ 1 (J»(X,u l -i))W(l), (53) 

where in the last step Eq. (p6|) was used. 
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We can now represent Eq. ( |53] > in terms of the unnormalized risk-sensitive information density 
matrix gf. Since a^(X), At(l), and AY(l) are elements in J^, they can be simultaneously 
diagonalized by a ^-isomorphism l, which leads to 



A t (af(X)) = t (<rf_ 1 (^(X,u i _ 1 )))A^ + i (<rf_ 1 (^(X ) u i _ 1 )))Ay i , 



where At; = t(At(J)) and Ay; = i(AF(7)). It then follows from Lemma 5.2 that the above 
equation leads to 

A^ = £' i (^_ 1 , Ui _i)At i + ^(^_ 1 ,u i _ 1 )Aj/;, ^ = p. (54) 

where = t(j(_i(u/_i)) = i(uf_i) is a function of Ayi, . . . , Ay;_i. The operators £ M and 
J 11 are defined as follows 

C^ig, u) := M + H(g, u)M+* + \ 2 M°H(g, u)M°* 

+ M°H(g,u) + H(g,u)M°* + —(H(q,u) - g), 

\ z 

J>*(g, u) := \ 2 M+H(g, u)M°* + \ 2 M°H(g, u)M + * + M+H(g, u) + H(g, u)M + *, 



Eq. ( p4[ ) is a simple recursion for a 2 x 2 matrix and is thus easily implementable on a digital 
computer. Note that the operators and reduce when fii = to C° — £ and J a = J, 
where C and J are given in Eqs. (38i and (39 1. This implies that the solution of Eq. |54]l 
converges to that of Eq. ( |3"7| ) when [i\ goes to zero. 



C. Calculating the risk-sensitive estimator 

We will now represent the cost function F of Eq. |44) in terms of g^ only. To this end, we 
define a new state Q l on M <g) Wn by 

Q l (X) :=¥[U{l)XU(l)*], 

for all X £ A i ® Wat. Since J^z is a commutative *-subalgebra of M. <g) Wat, we can apply 
to {yi,Q 1 )- That is, there exists a classical probability space (Hi, !Fi,Q l ) and a 



Theorem 



2.1 



*-isomorphism i : y t -> such that Q'(A) = E qi [i(A)] for all A G ^. We now have 

the following theorem. 

Theorem 5.1: The cost function in Eq. ( p4| i can be written as 



F(ui,...,u,)=Eq, [TV (^ a l x --"'l 



(55) 



where ui = t(u;) is a function of the measurement data Aj/i, . . . , Ay;, and is the risk- 
sensitive information density matrix that satisfies Eq. ( |54) , 

Proof: We define it; = j ; ~ 1 (u;) € C; as before. Since \ji(X e ) — u/| 2 = jz(|X — Uz| 2 ), we 

find 



F(ui, ...,u0 



R(l)*e 



H2jl(\X e -Ul\ 2 ) 



R(l) 



R(l)*U{l)*e 



*„fl2\X e -Ut\' 



U{l)R{l) 



*7"(i) 



where [/^(Z) is defined by Eq. (148). Using Eq. ( |50| ) in Lemma 5.1 and the tower property of 
the conditional expectation, we find 

F(ui,...,ui) =P V^iiye^^-^Vil) =P P(V(Z)*e" 2|x — ^V^Z) Cj) 

C/(0*Pf^ I (^)*e^l^- a! lV Al (0 Ci)u(l)] = Q l \a^(e^ x '-^ 
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where we have used the definition of erf in Eq. ( fBl) . Note that the above conditional expectation 
is well defined due to X c — ui £ C, and V^(l) £ C[. Let ui be given by ui = = 
It now follows from Lemma |572| that 



^(e^e-iMM =Tr 



Consequently, the cost function can be written as 



F(ui, . . . ,ui) = Eq; t(<rf(e 



E 



Tr 



-uiY 



This completes the proof. 



As a result of Theorem 5.1 our estimation problem is now cast as a classical optimal control 



problem. The resulting problem can be solved systematically by dynamic programming. We 
will only provide a brief summary of this. Consider the following optimal expected cost-to-go 
fi{g) at time I, given that gf 



Q 



fl(Q) 



E. 



Tr [g%e 



Q 



fN+i(e) = Tr (£0- 



This leads to the following dynamic programming equation; denoting Eq. (j54j» simply as gf = 

rOf_i,uj-i, Ayi), we have 

l r 



fi-i(g) = minEq^ 



/i(r(g,uj_i, Ay ; )) 



mm — 



/ i (r(g ! uj_i, A)) + /i(r(g, Ui_i, —A)) 



Note here that Prob Q »(Ay ; = A) = Q N [U (I)* P+ U (I)] 



) = 1/2. We can run the 



Q 

above algorithm efficiently in a digital computer and obtain the optimal sequence u° p (I = 
l,...,N), which yields X c (/) = i _1 (u° pt ) through a verification theorem (e.g., see [25]). 
Theorem 16.11 below will lead to a robustness result for the risk-sensitive estimator. 

Remark 5.3: Running the dynamic programming recursion on a digital computer is very 
costly computationally. Therefore we define a suboptimal risk-sensitive estimator by 

-/i, sub 



- — -u,sub , t 
X e (0 



argmin F I X c (1), . . 



(i-l).uj 



(56) 



That is, X 



/i,sub 



(Z) is to be calculated based on the assumption that we have already performed 
the above minimization procedure up to time I — 1 and obtained the suboptimal risk-sensitive 

estimators X e yi) £ y% (i = 1, ■ • • , I — 1). As shown in [7] (Theorems 2.2 and 4.2), the 
minimizer of the trace function inside the expectation in Eq. ( f55] l, w™ m , leads to the suboptimal 

^i.sub 

risk-sensitive estimator X c (I) — i~ (uf lm ). Hence our algorithm in this case is represented 
simply as follows: 

-/i.sub 



(0) 



argmin Tr 



g?e^ x ° 



-u,r 



Si 



r 



u,sub 



1) ,Ay ; 



(57) 

which is of the same structure as the classical algorithm presented in Section I-A. In Theorem 
|6.2| we will derive a bound for the conditional estimation error. The suboptimal risk-sensitive 
estimator minimizes this error bound. This provides a sound theoretical foundation for the 



suboptimal risk-sensitive estimator. Since the algorithm ( 57 1 is computationally much cheaper 



than the dynamic programming equation, we will consistently use the suboptimal estimator in 
the example part, Section VII. 

VI. Quantum uncertain systems and robustness of the risk-sensitive filter 

In realistic situations, we often have to deal with a system that includes some model un- 
certainty. From the classical case, we expect that the risk-sensitive estimator has an enhanced 
robustness property against such uncertainty. In this section, we first describe a class of uncertain 
quantum systems for which the uncertainty is quantified by the quantum relative entropy. We 
will then show robustness properties of the estimator. 
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A. Quantum uncertain systems 

Uncertainty can enter the system in many ways. It could for instance be the case that the 
state ip is unknown to us. The uncertainty then enters the system density matrix p through the 
relation F(X) = ijj <g> (f)® N (X) = Tr [X(p (g) ($$*)® JV )]. We assume that the field state is 
known and fixed to the vacuum <fi. This, however, is not the only way uncertainty can enter our 
model. We will also allow for uncertainty in the coefficients of the dynamics, i.e., the difference 
equation ([24]). We can push this uncertainty into the initial state, as described below. 

Let (SI, J 7 , P) be a classical probability space. Let p be an element of L°°(Sl, -F,P), i.e. p 
is a random variable on (f2,.F, P). Let p be the operator on L 2 (Sl,F, P) given by pointwise 
multiplication with p, i.e., 

(pf)(u) = p(w)/(w), / e L 2 (n, F,p),u,en. 

We denote the commutative ^-algebra of all such multiplication operators with functions in 
L°°(Sl,J r , P) by V. On V we can define a state r as integration with respect to the measure 
P, i.e., r(p) = f n p(u>)P (du>) . For simplicity we will take the operator p to be self-adjoint, i.e. 
it is a multiplication with a real-valued function. Next, let M'\ (i = ±, +, — , o) be .M-valued 
functions on C, i.e. AP : C — > M., such that the matrix M\ in Eq. ( |23| ) is unitary. Then, using 
the compositions of M % and p 6 "P, we can define the following difference equation 

AU(l) = [M ± (p)AA(l) + AI+(p)AA(l)*+M-(p)AA(l) + M°(p)At(l)]U(l-l), [7(0) = / 

(58) 

on the extended quantum probability space 

(V®M® W N , V) = (V®M® M® n ,t <g> V ® 8JV ). 

We now assume that the state r^tp is unknown to us. This means that Eq. ( |58] l is equivalent 
to the difference equation |24} such that its coefficients include uncertain parameter p. That is, 
the uncertainty in the model has been pushed completely into the state r ®ip. 

Now, let p truo = p* ruc (g) p* rue be the true density matrix corresponding to the unknown state 
T(g>4>- Then, the true filter is initialized to go = p truc . However, as p tluc is unknown, we fix a 
nominal density matrix p nom = p^ onl ^)p^° m , which in general differs from p truc , and construct 
the nominal filter that starts from qq = p nom . The nominal estimator of ji(X) is then given by 

nom/-v-\ , < 

„nom/ y\ u l \ i ^.nom/ v\ , — 1 / rrv, / .nom \r\ \ „ rt nom 
K l \ X > = cr nom (7) ' 1 ' ' X)J, Q = p 

Example 6.1: If p is a discrete random variable that takes the values p. L (i = 1, . . . , m) with 
unknown probability r^, then the corresponding multiplication operator is p = diag{pi, . . . ,p m } 
€ V, where the commutative ^-algebra V is the set of m x m diagonal matrices, and the true 
density matrix is Pp rue = diagjri, ...,r m }. To design a nominal filter, we choose a nominal 
density matrix of the form Pp° m — diag{r' 1; . . . , r' m }. In general, ^ r[. It is easily seen that 
the quantum relative entropy between the above two distributions is equal to the classical one: 

m 

R( P T c H om ) = E^ lo §^ = ^({^illW})- 

»=i r * 

Note that an important example for a true density matrix is Pp' ue = diag{l,0, . . . ,0}; that 
is, p is not a random variable but an unknown deterministic system parameter p = px. If 
we have no information about p at all, it is natural to take a uniform distribution Pp° m = 
diag{l/m, . . . , 1/m} as the nominal distribution. 
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B. Robustness properties of the risk-sensitive and suboptimal risk-sensitive estimators 

The nominal estimator nf om (X c ) differs from the true one 7Tj rue (X e ). Hence, 7r" om (X G ) is 
no longer the optimal estimator in the sense of the mean square error and thus can possibly 
take a large estimation error. However, as shown below, if one uses the nominal risk-sensitive 
estimator (given by Eq. (|43]>), the estimation error is guaranteed to be within a certain bound. 
This implies that the risk-sensitive estimator does have a robustness property against unknown 
perturbation of the system state and the system parameters. 



The quantum relative entropy (Hi will be used to express the robustness property. We here 
assume that the unknown true density matrix p truo = pp ruo ® pl ruc is within a certain distance 
from a known nominal density matrix p nom — Pp° m ® Ps° m ' 

R(p Uuc \\p nom ) < +00. (59) 

The following theorem will lead to a robustness property of the nominal risk-sensitive estimator 
defined in Eq. ( |43| ). 

Theorem 6.1: Let u; (I = 1,...,N) be an element of 3^. Then, we have the following 
inequality 

P tmc log (i?(iV)*e AI2|ijv( ^ ) - Ujv|2 i?(^)) 

< logP nom [i?(7V)*e' l2|jN(x '= ) - UN|2 i?(AV)j + R(p tIUC \\p nom ), (60) 

where R(N) is defined by Eq. g3), and P tme (X) = Tr [X(p tTUe ®(®3>*)® N )} and P nom (X) = 
Tr [X(p nom (g) ($$*)® Ar )]. 

Proof: Setting p = p true ® ($$*)® JV and p' = p nom ® (<f><f>*)® N in Eq. ([BJ, we have 

Ptruc(^) < logP nom (e z ) + i?(p truc ||p nom ), VZ e V ® M ® W N , 

where we have used the following additivity property: 

R(p tmc <g> ($$*)® w || p nom o = ^(p^llp 1101 ") + ^((^^*)® Ar ||(^^*)® JV ) 

= i?(p truc ||p nom ). 

Therefore, taking Z = log[i?(iV)*e^ 2 l jN (^)- u «l 2 J R(iV)] yields the theorem. ■ 
Eq. ( |60"1 > is a quantum version of the classical robustness result |5|, because the left hand 
side of Eq. (|60j> can be expanded up to second order in the estimation error as 

N-l 

true 



[fJH b ; (^e)-U i | 2 + M2 |j JV (X c )-U A r| 2 ] + 0(\. n (X c ) - U^) 

1 = 1 

< logP nom [i?(iV)*e^l jN ^)- UN l 2 i?(iV) + i?(p truo ||p nom ). 

fi.nom 

That is, as in the classical case, the nominal risk-sensitive estimator u; = X c (/), defined 



by Eq. ( |43] i, does have a robustness property, because it minimizes the upper bound of the 
estimation error under the unknown true state Ptruo- 

We remark that the relative entropy in Eq. ( |60"| ) can be written as 



The first term is a classical relative entropy as shown in Example |6.1| Thus, if there is no 
uncertainty in the quantum state, the estimation error bound is written in terms of classical 
quantities only. 

We now change our focus to the suboptimal risk-sensitive estimator X c (I) defined in 



Remark 5.3 The following Theorem shows that the conditional estimation error at time / also 
has an upper bound. This will lead to a robustness property for the nominal suboptimal risk- 
sensitive estimator of Remark [ 
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Theorem 6.2: Let uj (I 
inequality 



1,...,N) be an element of 3^- Then, we have the following 



(Ptruc[|jl(*c) -U/| 2 |^]) < -logTr [^,nom eM2 |X e -«,| 2 ] + 

v ' P2 A*2 



f ue ||pf' nom ), (61) 



where pf ne = Ue /Tr [pf uc ] and pf' nom = ^' nom /Tr [gf ' nom ] are the conditional density 
matrices corresponding to the true filter and the nominal risk-sensitive filter, and ui = t(u;). 

Proof: Using the definition of the optimal estimator iri (X c ), the left-hand side in inequality 
( |6"T) can be rewritten as 



'(Ptruc[j ; (|*c " Ui| 2 )|^]) = l(k? UC (\X c - U Z | 2 ) 



Tr 



Pi 



where the last equality follows directly from Lemma [572] with ui = t(j;(u;)) = i(u;). Then, 
from Eq. (\3\ we have the assertion. ■ 
The first term of the right-hand side in Eq. ( |6"Tj ) is minimized when choosing th e no minal 

suboptimal risk-sensitive estimator ui = 1 1 X e (I) J given by Eq. ( p7| ). Theorem 



6.2 



there- 
fore shows a robustness property of the suboptimal risk-sensitive estimator defined in"T?emark 



VII. Examples 

In this section, we study two examples in detail. The first example is a two-level atom that is 
coupled to the field via a dispersive interaction. This coupling can be obtained by putting the 
atom in a cavity that has a resonance frequency far detuned from the transition frequency of 
the two-level atom. The second example deals with a two-level atom that decays to the ground 
state due to spontaneous emission into its environment. We consider the situation where the 
quantum state of the two-level atom and a physical parameter are unknown to us. In particular, 
we employ the nominal suboptimal risk-sensitive estimator given by Eq. ( |57) , We compare this 
estimator with both the true risk-neutral and nominal risk-neutral estimators. 



A. Dispersive interaction model 

The interaction Hamiltonian (19i in case of a dispersive interaction with the field, is given 
by the following system matrices: 



Li = 0, L 2 = iy/gvz, L 3 = 0, 



(62) 



where a z = diagjl,— 1} and g > represents the interaction strength. From Eqs. ( f2T| and 
(|23]l, we see that the matrices AP (i = ±, +, — , o) are given by 



M ± (g) = 0, M+(g) 



sin(sA) sin(ffA) cos( ff A) - 1 

!, M {91 = t &z, M (g) = 



I. 



A "" v " A "'" u " A 2 

We assume that g is a classical random variable that takes the values gi with unknown prob- 
abilities Prob(gi) = r,. As seen in Section VI-A, g can be regarded as an observable g = 
diag{<7i, . . . , g m } € V, where V is a commutative *-algebra given by the set of m x to 
diagonal matrices. The corresponding unknown true density matrix is Pp uc = diagjri, . . . , r m }. 
In particular, we now study a toy model in which g can take 20 discrete values, <?j = 0.4 + 
0.03i [i = 1, . . . , 20). Moreover, we choose p P ruc to be given by 

p P ruc = diag{0, 0.01, 0.04, 0.1, 0.7, 0.1, 0.04, 0.01, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, (63) 

which is illustrated in Fig. 1 (al). For instance, g takes 53 = 0.49 with probability Prob^) = 
0.04. Furthermore, we assume that the true density matrix is given by 



0.5 0.5 
0.5 0.5 



(64) 
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Again, note that p c = p p YU e (g) p s IUC is unknown to us. 

Now, let us consider estimating the system observable X c = a z . To design a nominal filter, 
we use the following nominal density matrix in V ® Ad: 



/> : """ ■ ■ ,j n " ul ■>'""" — Hiacr/ _ L\ 

p — p p 69 p s , p v — s t20'""'20J' Ps 



0.5 0.25 
0.25 0.5 



(65) 



Pp° m is depicted in Fig. 1 (a2). The nominal risk-neutral estimator 7r" om (cr 2 ) and the nominal 
risk-sensitive one a^ M ' su (I) are then calculated from Eq. ( |37| ) with go = p nom and Eq. ( f57| with 
= p nom , respectively. The risk-sensitive parameters are chosen to be Qui, P2) = (0.1, 0.182). 
Note that the filter equations include the composition M l Qg) and are driven by the true output 
data Ayi. We compare those two nominal estimators with the ideal true risk-neutral estimator 
7r i true (a 2 ), which is calculated from Eq. ( |37| > with qq = p tluc . To do this, we use the averaged 
total estimation errors 



1 

\^r c (a z )) - i(nr m (a z ))\, A rs = - ]T |t(7rf™(a,)) - i (^' sub (0) 



1=1 1=1 

(66) 

The histogram for these values are depicted in Fig. 1 (b) for 200 sample paths with A 2 = 0.001 
and N — 2000. Overall, A rs is smaller than A rn , showing the better performance of the risk- 
sensitive estimator over the risk-neutral one. Figs. 1 (c) and (d) illustrate an example of sample 
paths of the estimators; in Fig. 1 (c) the solid line shows t(o r ^ A1,sub (/)), while in Fig. 1 (d) the 
solid line is i(Trf om (a z )). In both figures, the thick dotted line is t(7r / truc (o-2)). In Fig. 1 (c), 
both estimators are quite close to each other in spite of the difference in their initial states. 
On the other hand, as depicted in Fig. 1 (d), the nominal risk-neutral estimator fails in the 
estimation, although it finally converges to the true value — 1. As a summary, the risk-sensitive 
estimator outperforms the nominal risk-neutral estimator in the presence of uncertainty. 

Remark 7.1: The performance of the nominal estimator depends on the magnitude of un- 
certainty. For example, if there is no uncertainty in the nominal distribution, the nominal risk- 
neutral estimator coincides with the true optimal estimator and clearly works better than the 
risk-sensitive one. However, under the existence of some uncertainty, the risk-neutral estimator 
is no longer optimal and will be inferior to the risk-sensitive one, as seen in Fig. 1. To make 
a more quantitative observation, we consider the following nominal distribution characterized 
by one parameter (3 £ [0, 1] that represents the uncertainty magnitude: 

(Pjr^ki = (Pp° m,/3 k9 = • ■ ■ = (Pp° m,/3 )2o,2o = 0.05/3, 

(p™ m ' /3 ) 2 , 2 = K om,/3 )8,8 = 0.04/3 + 0.01, (pl om '%, 3 = (/£ om,/3 )7,7 = 0-01/3 + 0.04, 



(Pp° m,/3 k4 = (pJT^ke = -0-05/3 + 0.1, (p D p om ' p ) 5 , 5 = -0.65/3 + 0.7, 
0.5 0.5 - 0.25/3 

0.5 - 0.25/3 0.5 



nom,/3 
Ps 



When P — 0, the nominal distribution is equal to the true one; Pp° m '° <g> p" om '° = p P ruo ® pf ue . 
Hence (3 = implies there is no uncertainty. On the other hand when /? = 1, the nominal 
distribution is the one given in Eq. ( |65) l. We consider the nominal risk-neutral estimator and 
the risk-sensitive one with p\ = 0.01, /12 = 0.05. Note that these two estimators are close to 
each other due to the small risk-sensitive parameter. To evaluate their performances, we calculate 
the averaged total estimation errors §66\ and compare them. In Fig. 2 (al), the horizontal axis 
shows the uncertainty magnitude j3, while the vertical axis shows the average of A rn and A rs 
over 100 sample paths, which are denoted by A™ and A rs , respectively. Fig. 2 (a2) shows 
examples of the nominal parameter distribution p^ om ^. The risk-sensitive estimator clearly 
shows a better performance than the risk-neutral one, except in the case of a small (3. 

Remark 7.2: The robustness property of the risk-sensitive filter is based on the fact that the 



estimation error is upper bounded, as presented in Theorems 6.1 and 6.2 Fig. 2 (b) illustrates 
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Fig. 1. For the dispersive interaction model of the atom, (a) the true and nominal parameter distributions, (b) the 
histogram of the averaged total estimation errors, (c) sample paths of the nominal risk-sensitive estimator (solid line) 
and the true risk-neutral one (thick dotted line), and (d) sample paths of the nominal risk-neutral estimator (solid line) 
and the true risk-neutral one (thick dotted line). For the figures (c) and (d), the notation (z.) is omitted. 



sample paths of the conditional estimation error and its upper bound given in Theorem |6.2 

/ 1 , m r ii unm ,,„IV _i,,l 2 1 1 



i(Pt™[tfi(*e)-tU| 2 |3V 



/'2 



log It ^ 



M,nom ju 2 |^"e-wir 



/'2 



D/ true 1 1 u.nom\ 

R\Pi \\Pi )■ 



(67) 

From this, we see that the bound is much larger than the actual estimation error. This is very 
similar to the classical case where one also often finds a very conservative upper bound. 

Remark 7.3: As in the classical case, there is no theoretical procedure to determine the best 
risk-sensitive parameters (fxi,^). We here only maintain that a non-zero fj.%, the weighting 
parameter of the running estimation error cost, is actually helpful in obtaining a high-quality 
risk-sensitive filter. To show this fact, we apply a nominal risk-sensitive filter with /ii = 0, 
initialized to Eq. ( |65j ), to the same uncertain system as discussed above. For this filter, /L«2 = 
0.281 appears to be the best parameter. Fig. 2 (c) illustrates the mean values over 200 sample 
paths of the conditional estimation error Ei in Eq. §67\ . The upper dotted line and lower solid 
one corresponds to the risk-sensitive estimation with (fix, [12) — (0.0,0.281) and (/ii,/^) = 
(0.1,0.182), respectively. This shows that a non-zero fii does improve the performance of the 
estimator. 



B. Spontaneous emission model 

In the case of spontaneous decay, the interaction Hamiltonian ( 19 1 is given by 

L\ = 0, L 2 = iy/ecr-, L 3 = 0, 



(68) 



21 




Fig. 2. For the dispersive interaction model of the atom, (al) the averaged total estimation errors of the nominal 
risk-sensitive filter (solid line) and the nominal risk-neutral one (dotted line), (a2) examples of the nominal parameter 
distributions, (b) the conditional error (dotted line) and the guaranteed error bound (solid line) in Theorem |6.2| and 
(c) the averaged conditional estimation errors of the nominal risk-sensitive filters with (/ti , (12) = (0.0, 0.281) (upper 
dotted line) and (p,i,^2) = (0.1,0.182) (lower solid line). 



where cr_ is defined in Eq. ([16]) and e > represents the emission rate. The matrices M l (i 
±, +, — , o) are determined from Eqs. pT| and ( |23] > and read 



M ± (e) = (1 - cos(eA))(7 z , M+(e) - Sm(eA) 



A 



, t / \ sin(eA) „,., . cos(eA) — 1 
M-(e) = M°(e) = — ^ a + a^. 

Similar to the dispersive interaction case, we here assume that e behaves as a classical discrete 
random variable with an unknown probability distribution; e is then replaced by an observable 
e = diag{ei, . . . , e m } S V. In particular, we assume e, = 0.2 + 0.04i (i = 1, . . . , 20) with the 
true density matrix 

p* ruc = diag{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.01, 0.04, 0.9, 0.04, 0.01, 0}, 

which is illustrated in Fig. 3 (a). The system true density matrix pl IUC is given by Eq. ( |64"| l. 
For a nominal density matrix, we take Pp° m in Eq. ( |65j ) and assume that p" om = /0* IUG . In the 
above setting, we consider estimating X c = a y := i(<r_ — investigate the performance 
of the nominal risk-sensitive filter, and compare it with the nominal risk-neutral one. The risk- 
sensitive parameters are chosen as (/ii,/^) = (0.15,0.25). Fig. 3 (b) shows the histogram for 
the averaged error A rs and A rn for 200 sample paths with A 2 = 0.001 and N = 5000. Fig. 3 
(c) shows the sample paths of t(7r ; truc (crj / )) and i(3^ /i ' sub (?)), while in Fig. 3 (d) t(7r[ rue (crj / )) 
and i(7r" om (a y )) are shown. These figures clearly show that the nominal risk-sensitive estimator 
is superior to the nominal risk-neutral estimator. 

Remark 7.4: While in this paper we have considered the estimation problem over the finite- 
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Fig. 3. For the spontaneous emission model of the atom, (a) the true and nominal parameter distributions, (b) the 
histogram of the averaged total estimation errors, (c) sample paths of the nominal risk-sensitive estimator (solid line) 
and the true risk-neutral one (thick dotted line), and (d) sample paths of the nominal risk-neutral estimator (solid line) 
and the true risk-neutral one (thick dotted line). For the figures (c) and (d), the notation (t) is omitted. 



time horizon, let us here look at the asymptotic behaviour as I — > oo of the following quantity 

^ = |^' uc (X c )-7rr m (X )|, 

where 7r' rue (X ) and 7r ; nom (X c ) correspond to the standard risk-neutral estimator for the true 
and nominal initial states, respectively. If lim^oo Si — for all observables X e , then we say 
the filter is stable. Recently, Van Handel [22] has provided the following characterization for 
filter stability in continuous time. For all X c included in the observable space 

= sp&n{C Cl J dl C C2 ■■■C Ck J dk (I) : k,a,di>0}, 

we have 5i — > 0. Here, C and J are the continuous time analogues of the quantities defined in 
Eq. p6) i. Therefore, the filter is stable if dimO = dimA 
In our examples the observable spaces are given by 

O d,s = span{7, a z }, O spon = span{/, a x , a z }. 

Therefore, for a dispersive interaction where we estimate a z G O dls , it is guaranteed by Van 
Handel's theorem that 7r" om (a z ) with any initial state converges to the true estimator. On the 
other hand, in the spontaneous decay case, due to a y £ Q s P° n ^ we cannot expect that Si — * 0. 
This could be the reason why the increase in performance by the nominal risk-sensitive estimator 
over the risk-neutral one is more pronounced in Fig. 3 than in Fig. 1. We must note here that 
in simulations we do see that, with the settings used in Fig. 3, 7r J nom ((Tj / ) eventually converges 
to the true value 0. However, this convergence is very slow. 
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